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Abstract: In recent years, deep convolution neural network has exhibited excellent performance in 
computer vision and has a far-reaching impact. Traditional plant taxonomic identification requires high 
expertise, which is time-consuming. Most nature reserves have problems such as incomplete species 
sutveys, inaccurate taxonomic identification, and untimely updating of status data. Simple and accurate 
recognition of plant images can be achieved by applying convolutional neural network technology to 
explore the best network model. Taking 24 typical desert plant species that are widely distributed in the 
nature reserves in Xinjiang Uygur Autonomous Region of China as the research objects, this study 
established an image database and select the optimal network model for the image recognition of desert 
plant species to provide decision support for fine management in the nature reserves in Xinjiang, such as 
species investigation and monitoring, by using deep learning. Since desert plant species were not included in 
the public dataset, the images used in this study were mainly obtained through field shooting and 
downloaded from the Plant Photo Bank of China (PPBC). After the sorting process and statistical analysis, 
a total of 2331 plant images were finally collected (2071 images from field collection and 260 images from 
the PPBC), including 24 plant species belonging to 14 families and 22 genera. A large number of numerical 
experiments were also carried out to compare a series of 37 convolutional neural network models with 
good performance, from different perspectives, to find the optimal network model that is most suitable for 
the image recognition of desert plant species in Xinjiang. The results revealed 24 models with a recognition 
Accuracy, of greater than 70.000%. Among which, Residual Network X_8GF (RegNetX_8GF) performs the 
best, with Accuracy, Precision, Recall, and F1 (which refers to the harmonic mean of the Precision and Recall 
values) values of 78.33%, 77.65%, 69.55%, and 71.26%, respectively. Considering the demand factors of 
hardware equipment and inference time, Mobile NetworkV2 achieves the best balance among the Accuracy, 
the number of parameters and the number of floating-point operations. The number of parameters for 
Mobile Network V2 (MobileNetV2) is 1/16 of RegNetX_8GF, and the number of floating-point 
operations is 1/24. Our findings can facilitate efficient decision-making for the management of species 
survey, cataloging, inspection, and monitoring in the nature reserves in Xinjiang, providing a scientific basis 
for the protection and utilization of natural plant resources. 
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1 Introduction 


Wild plants constitute the main part of the ecosystem in the nature reserves. Thus, it is the 
primary problem for managers to carry out species investigation and classification on the wild 
plants (Li et al., 2020). Owing to the high professional knowledge demand, the traditional plant 
classification and recognition is time-consuming and inefficient (Cao et al., 2018), the existing 
species survey in most of nature reserves is not comprehensive, and the classification of plant 
species is not accurate, leading to the problems such as status data update is not in time, and the 
administration agencies of nature reserves are unable to make timely and effective protection and 
management as well as countermeasures for ecological recovery (Liu et al., 2018; Wang et al., 
2019; Xiao, 2019). In June 2019, the General Office of the State Council of the People's Republic 
of China issued a document to establish national park as the main body of the nature reserve 
system, based on ecological environment regulation and big data platform, for use of information 
technology means such as cloud computing and Internet of things for a comprehensive grasp of 
nature reserve ecosystem composition, distribution, and dynamic change, thus providing a 
scientific support for the management decisions of nature reserves (http://www.gov.cn/zhengce/ 
2019-06/26/content_5403497.htm). Therefore, how to use new information technology to obtain 
plant-related data more efficiently and accurately has become an urgent problem to be solved by 
researchers and managers. 

With its powerful feature extraction, convolution neural network has significant advantages in 
the recognition and analysis of high-dimensional data such as images, sounds, and texts, which 
can reduce the damage of field specimens to fragile plant resources, decrease the difficulty of 
identification and classification of similar plant species, and improve work efficiency (Mikolov et 
al., 2011). In recent years, intelligent recognition of plant images has gradually become a research 
hotspot (Liu, 2020). In the previous literature, convolutional neural network was used to 
recognize the images of leaves, fruits, flowers, and other organs of plants under a simple 
background (Hall et al., 2015; Abdullahi et al., 2017; Bargoti and Underwood, 2017; Coulibaly et 
al., 2019; Cao et al., 2020). Researchers have classified and recognized the images of five crops 
and 100 different ornamental plant species in different nature scenes by convolutional neural 
network (Simonyan and Zisserman, 2014; Kussul et al., 2017; Liu, 2018). Even several mature 
convolutional neural network image recognition systems, such as "Xingse APP" and "Aiplants 
APP", have been widely used in the survey of wild plant resources (Gao et al., 2020). However, 
the accuracy of these plant image recognition systems is generally low in the classification and 
recognition of desert plant images under the conditions of complex nature scenes (Jin, 2020). For 
example, Zhang and Huai (2016) used hierarchical deep learning to train and recognize leaf 
images of plants with simple scene and complex scenes, and found that the recognition rate of 
plants with single scene was as high as 91.11%, while the recognition rate of plants with complex 
scenes was only 34.38%. The main problems are as follows. First, the number of images of the 
same desert plant species in different nature scenes is too small, and there are fewer images that 
can focus on the salient classification characteristics of desert plant species. Second, previous 
image recognition system is based on all urban and rural cultivated plants, or a certain organ of 
plants, or simple background image datasets. However, in the evolutionary process of long-term 
adaptation to the special desert environment, the external morphology of different desert plant 
species has similar characteristics (homogenization of plant and branch characteristics, similarity 
of branch morphology and color, highly degraded leaf patterns, etc.), which increases the 
difficulty of machine learning visual recognition and makes it easier to produce misjudgment (He 
et al., 2006). To solve the first problem, researchers proposed the method of obtaining a large 
number of plant images conforming to technical requirements (Jin, 2020). The second problem is 
the key scientific and technical issue that this study needs to focus on: how to significantly 
improve the image recognition accuracy of similar plant species in complex nature scenes and 
select the optimal network model suitable for the image recognition of desert plant species, which 
is a very challenging task with broad practical application. 
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In view of the lack of research on the image recognition of desert plant species, this study took 
the panoramic image set of major desert plant species distributed in nature reserves in Xinjiang 
Uygur Autonomous Region of China as the research object, and integrated 37 non-lightweight 
and lightweight models of eight categories, such as Visual Geometry Group Network (VGG), 
Residual Network (RegNet), and Mobile Network (MobileNet), which are widely used at present 
(Krizhevsky et al., 2012; He et al., 2016; Howard et al., 2017). By using grid search to find the 
optimal hyperparameters and comparing the performance, we discussed the optimal network 
model suitable for the image recognition of desert plant species, so as to achieve convenient and 
accurate classification and recognition of desert plant species, and provide a solution for 
large-scale field plant background investigation in nature reserves in Xinjiang in the future. 


2 Materials and methods 


2.1 General survey of nature reserves in Xinjiang 


At present, there are 201 protected nature reserves in Xinjiang, which cover an area of 2.51x10° 
km’, accounting for 15.07% of the total land area of Xinjiang (Fig. 1). Among them, there are one 
World Natural Heritage Site, 28 nature reserves, 24 scenic spots, 13 geological parks, 57 forest 
parks, 51 wetland parks, and 27 desert parks. In terms of regional distribution, there are 63 in 
southern Xinjiang and 138 in northern Xinjiang, accounting for 31.00% and 69.00% of the total 
number, respectively. 
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Fig. 1 Overview of Xinjiang and spatial distribution of nature reserves in Xinjiang. Note that the figure is based 
on the standard map (tT S(2021)023) of the Map Service System (https://xinjiang.tianditu.gov.cn/main/bzdt.html) 
marked by the Xinjiang Uygur Autonomous Region Platform for Common Geospatial Information Services, and 
the standard map has not been modified. Satellite image source: Geospatial Data Cloud (http://www.gscloud.cn/). 


2.2 Dataset 


Based on the "List of National Key Protected Wild Plants" (Ming, 2021), this study selected 24 
representative xerophytic desert plant species that are distributed in nature reserves in Xinjiang as 
the identification objects (Fig. 2). Since desert plant species were not included in the public 
dataset, the images were mainly obtained through field shooting and downloaded from the Plant 
Photo Bank of China (PPBC; http://ppbc.iplant.cn/sp/12519). The field collection was extend 
from 2019 to 2021. Rangers in nature reserves were commissioned to take pictures with digital 
cameras or mobile phones in the natural environment. Those pictures were RGB true color images 
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in JPG format. The collected plant images were confirmed by experienced plant experts and 
labeled manually. Note that some unclear images were deleted directly. After the sorting process 
and statistical analysis, a total of 2331 plant images were finally collected (2071 images from 
field collection and 260 images from the PPBC), including 24 plant species belonging to 14 
families and 22 genera (Table 1). The training, validation, and test sets were allocated in a ratio of 
3:1:1. The plant species information can be found in the Flora of Xinjiang (Xinjiang Flora 
Editorial Committee, 1992—2004) and the Red List of Chinese Biodiversity: Higher Plant Volume 
(http://www. iplant.cn/rep/protlist/4). 


Ephedra intermedia Iljinia regelii Corydalis kashgarica Zygophyllum kaschgaricum Ammopiptanthus nanus Oxytropis bogdoschanica 
(00001) (00002) (00003) (00004) (00005) (00006) 


(g) 


Caragana polourensis Glycyrrhiza inflata Ammodendron bifolium Eremosparton songoricum Lagochilus lanatonodus Frankenia pulverulenta 
(00007) (00008) (00009) (00010) (00011) (00012) 


Salsola junatovii Gymnocarpos przewalskii Helianthemum songaricum Haloxyon persicum Caryopteris mongholica Populus pruinosa 


(00013) (00014) (000015) (00016) (00017) (00018) 
Tamarix taklamakanensis_ Cistanche deserticola Calligonum ebinuricum Prunus tenella Haloxylon ammodendron Populus euphratica 
(00019) (00020) (00021) (00022) (00023) (00024) 


Fig. 2 Images of the selected 24 desert plant species in nature reserves in Xinjiang 


2.3 Methods 


Convolutional neural network is a branch of deep learning, which is a kind of feedforward neural 
network structure containing convolutional computation and with deep structure. In recent years, 
it has been widely used in the field of image recognition (Lecun and Bengio, 1998). 
Convolutional neural network includes convolutional layer, pooling layer, and fully connected 
layer (Fig. 3). The mathematical expression of the network is as follows: 


F(x) = fynai AED), (1) 
where x represents the input image, F(x) represents the output of the network, such as the 


corresponding class or probability of the input image x; N represents the number of hidden layers; 
and f; represents the function of the corresponding layer i. 
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Table 1 Basic information of the selected 24 desert plant species in nature reserves in Xinjiang 


Field Total 


Species name Family Genera Life form Protection category : : 
images images 

Ephedra intermedia Ephedraceae Ephedra Shrub Second-class national 100 114 
(00001) protected plant in China 
Iljinia regelii Chenopodiaceae Iljinia Subshrub - 50 54 
(00002) 
Corydalis kashgarica Papaveraceae Corydalis Perennial - 70 94 
(00003) herb 
Zygophyllum kaschgaricum Zygophyllaceae Zygophyllum Shrub Second-class protected 80 100 
(00004) plant in Xinjiang, China 
Ammopiptanthus nanus Fabaceae Ammopiptanthus Shrub Second-class national 70 73 
(00005) protected plant in China 
Oxytropis bogdoschanica Fabaceae Oxytropis Perennial - 90 96 
(00006) herb 
Caragana polourensis Fabaceae Caragana Shrub - 70 90 
(00007) 
Glycyrrhiza inflate Fabaceae Glycyrrhiza Perennial Second-class national 50 72 
(00008) herb protected plant in China 
Ammodendron bifolium Fabaceae Ammodendron Shrub First-class protected plant 15 16 
(00009) in Xinjiang, China 
Eremosparton songoricum Fabaceae Eremosparton Shrub Second-class protected 30 33 
(00010) plant in Xinjiang, China 
Lagochilus lanatonodus Lamiaceae Lagochilus Perennial - 30 33 
(00011) herb 
Frankenia pulverulenta Frankeniaceae Frankenia Annual herb Second-class national 20 32 
(00012) protected plant in China 
Salsola junatovii (00013) Chenopodiaceae Salsola Subshrub - 105 148 
Gymnocarpos przewalskii Caryophyllaceae Gymnocarpos Subshrub First-class national 100 112 
(00014) protected plant in China 
Helianthemum songaricum Cistaceae Helianthemum. Shrub Second-class national 125 125 
(00015) protected plant in China 
Haloxylon persicum Chenopodiaceae Haloxylon Tree Second-class national 60 63 
(00016) protected plant in China 
Caryopteris mongholica Verbenaceae Caryopteris Shrub - 130 149 
(00017) 
Populus pruinosa Salicaceae Populus Tree First-class protected plant 50 50 
(00018) in Xinjiang, China 
Tamarix taklamakanensis Tamaricaceae Tamarix Shrub Second-class national 151 151 
(00019) protected plant in China 
Cistanche deserticola Orobanchaceae Cistanche Perennial Second-class national 100 114 
(00020) herb protected plant in China 
Calligonum ebinuricum Polygonaceae Calligonum Shrub Second-class protected 105 105 
(00021) plant in Xinjiang, China 
Prunus tenella Rosaceae Prunus Tree First-class protected plant 20 31 
(00022) in Xinjiang, China 
Haloxylon ammodendron Chenopodiaceae Haloxylon Tree Second-class national 220 227 
(00023) protected plant in China 
Populus euphratica Salicaceae Populus Tree - 230 249 


(00024) 


Note: Values in the parentheses represent the serial numbers and correspond to Figure 1. -, no protection level. Protection category was 
referred from the Information System of Chinese Rare and Endangered Plants (ISCREP; https://www.iplant.cn/rep/protlist/3). 
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| Convolutional layer] Pooling layer Fully connected layer 


Softmax layer 


Fig. 3 Schematic diagram of convolutional neural network 


In the convolutional layer, f consists of multiple convolution kernels (g', ..., gk, g*), and the 
common convolution kernel sizes are 1x1, 3x3, 5x5, and so on. Each gk represents a linear 
function in the k'" kernel, which can be expressed as follows: 


n w 


s(ay= > dv YW, lav, wIa-u,y-v,z-w), (2) 
u=—m v=—n w=—d 
where (x, y, z) represents the position of the pixel in the input image J; Wi(u, v, w) represents the 
weight of the kernel k; and m, n, and w represent the height, width, and depth of the convolution 
kernel, respectively. 

In the activation layer, fis a pixel-wise nonlinear function, that is, a rectified linear unit, which 
can be represented by the following equation: 

f(x) = max(0, x). (3) 

In the pooling layer, f is a layer-wise nonlinear down-sampling function, which aims to 
gradually reduce the size of the feature representation. 

The fully connected layer can also be considered as a convolutional layer with a kernel size of 
1x1. In classification tasks, usually, a prediction layer (i.e., softmax layer) is added to the last 
fully connected layer to calculate the probability whether the input images may belong to 
different classes. For instance, if the number of neurons in the prediction layer is C (that is, the 
number of categories is C): pı, P2, .... Pc, the above C values can be converted to the probability 
values through the softmax layer (Eq. 4). 

Pi 


= (i=l, 2, ..., C), (4) 


Pi 
din’ 


nara cC ; Ag 
Here, it is set that 4 p, =1. Finally, the loss function is calculated, and the parameters are 


updated through the stochastic gradient descent algorithm. The cross-entropy /oss function is one 
of the most commonly used loss functions in deep learning, which can measure the difference 
between the true value and the predicted value (Li et al., 2020). It is calculated as follows: 


loss = -5c y; log(p;), (5) 


where y;and p; represent the expected label value and the predicted output value of sample i, 


respectively. 

In general, the performance of convolutional neural network becomes better as the number of 
network layers deepens, such as VGG with 16 layers, Google Inception Network (GoogLeNet) 
with 22 layers, and Residual Network (ResNet) with 152 layers (Simonyan and Zisserman, 2014). 
However, research shows that no network structure can be guaranteed to outperform other 
network structures on any dataset (Liu and Luo, 2019). For a specific dataset, it is necessary to 
select the network structure with the best performance according to the experimental results. 
Therefore, this study adopts eight categories (including VGG, ResNet, Dense Convolutional 
Network (DenseNet), Squeeze Network (SqueezeNet), MobileNet, Shuffle Network (ShuffleNet), 
Efficient Network (EfficientNet), and RegNet) of 37 common non-lightweight and lightweight 
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network structures, and adjusts the model parameters to find the best performing network 
structure. The experimental environment was: Intel (R)Xeon (R) CPU E5-2640 v3 @2.60GHz, 
NVIDIA GeForce GTX 2080Ti, Ubuntu 18.04.1. The PyTorch 1.6 deep learning framework was 
used, and the batch sizes were set to 4, 8, 16, and 32. Using the Stochastic Gradient Descent 
(SGD) optimization algorithm (Li et al., 2021), we determined the following values: a learning 
rate size, a momentum of 0.9, and a weight decay rate of 0.005. 


2.4 Accuracy evaluation 


In this study, the Accuracy, Precision, Recall, and F1 (which refers to the harmonic mean of the 
Precision and Recall values) were used to evaluate the model results (Cai, 2020). The Accuracy 
measures the ratio of all the correct judgment results of the classification model to the total 
samples. Precision is the proportion of the results that are predicted to be positive. Recall refers to 
the proportion of all the positive samples that are judged to be positive. 


Accuracy = Aba i x100, (6) 
TP + FP +TN + FN 
Precision = ——— x 100, (7) 
TP + FP 
Recall = a x 100, (8) 
TP + FN 


Fl= 2x Precision x Recall 


x 100, (9) 


Precision + Recall 
where TP, TN, FP, and FN represent the numbers of true positive, true negative, false positive, 
and false negative samples in the prediction results, respectively. 

The complexity of different models is measured by the number of parameters and the number 
of floating-point operations (Shen, 2021). The number of parameters refers to the total number of 
parameters that need to be trained in the network model, which is used to measure the size of the 
model. The number of floating-point operations refers to the number of floating-point operations 
per second, which can be used to measure the algorithm complexity. The higher the number of 
floating-point operations, the slower the operation speed of convolutional neural network. 


3 Results 


3.1 Recognition results and comparative analysis of multi models in the image recognition 
desert plant species 


The model recognition results of plant images are presented in Table 2. Thirteen models with 
Accuracy below 70.000% were found, of which the following three were below 55.000%: 
SqueezeNet1_0, SqueezeNet1_1, and ShuffleNetV2_X0_5. Twenty-four models with Accuracy 
exceeding 70.000% were found, of which the following nine exceeded 75.000%: 
EfficientNet_B1, EfficientNet_B3, RegNetX_400MF, RegNetX_800MF, RegNetX_3_2GF, 
RegNetX_8GF, RegNetX_l6GF, RegNetY_3_2GK and RegNetY_l16GE RegNetX_8GF 
outperformed the other networks, with Accuracy, precision, Recall, and F1 values of 78.333%, 
77.654%, 69.547% and 71.256%, respectively. 

In addition to the above results, we also compared the number of parameters and the number of 
floating-point operations for the different network structures. For the number of parameters, there 
were 16 models smaller than 10.000 M (megabyte, which refers to the storage space occupied by 
model parameters; 1 M=1024 kilobytes) and four models larger than 100.000 M (VGGI11, 
VGG13, VGG16, and VGG19). For the number of floating-point operations, there were 15 
models smaller than 1.000 G (model operation speed; 1 G=10°/s). We used two indicators to 
quantify the relationships of the Accuracy with the number of parameters and the number of 
floating-point operations (Fig. 4). Amongst the models with an Accuracy higher than 70.000%, 


LI Jicai et al.: Image recognition and empirical application of desert plant species... 


MobileNetV2 achieves the best balance among the Accuracy, the number of parameters, and the 
number of floating-point operations. For MobileNetV2, the Accuracy reaches 71.429%, the 
number of parameters is only 2.255 M, the number of floating-point operations is only 0.313 G, 
the ratio of Accuracy to the number of parameters is 31.676, and the ratio of Accuracy to the 
number of floating-point operations is 228.206. Although RegNetX_8GF exhibited the best 
performance, the number of parameters and the number of floating-point operations were 16 and 
25 times higher, respectively, compared to MobileNetV2. 


Table 2 Experimental results of 37 different models used in the image recognition of desert plant species 


Model Accuracy (%) Precision (%) Recall (%) F1 (%) Params (M) FLOPs (G) 
VGGI1 69.286 68.746 59.769 61.909 128.865 7.613 
VGG13 67.857 64.700 58.584 58.310 129.049 11.317 
VGGI16 65.952 57.170 51.073 51.356 134.359 15.480 
VGG19 66.667 67.341 54.269 56.148 139.669 19.643 
ResNet18 72.857 75.693 61.621 64.463 11.189 1.819 
ResNet34 72.857 74.432 61.448 64.091 21.297 3.671 
ResNet50 73.333 69.447 63.473 65.239 23.557 4.110 
ResNext50_32_4xD 72.857 71.907 61.531 63.486 23.029 4.257 
ResNet101 73.095 66.146 60.646 61.639 42.549 7.832 
ResNext101_32_8xD 72.857 73.280 61.886 63.927 86.792 16.475 
DenseNet121 65.000 58.949 53.998 54.663 6.978 2.865 
DenseNet 169 65.476 55.327 54.604 53.783 12.524 3.396 
DenseNet201 65.952 59.139 58.259 57.824 18.139 4.339 
SqueezeNet1_0 54.286 35.847 34.203 33.751 0.748 0.739 
SqueezeNet1_1 53.095 37.520 35.006 33.772 0.735 0.267 
MobileNetV2 71.429 64.504 59.919 60.613 2.255 0.313 
MobileNetV3-Small 61.429 56.082 51.884 52.685 1.542 0.058 
MobileNetV3-Large 64.286 60.755 55.215 56.053 4.233 0.224 
ShuffleNetV2_X0_5 53.095 30.013 33.321 30.101 0.366 0.042 
ShuffleNetV2_X1_0 60.476 35.104 40.711 36.959 1.278 0.148 
EfficientNet_BO 73.810 67.480 62.850 63.810 4.038 0.400 
EfficientNet_B1 75.238 75.122 64.517 66.999 6.544 0.591 
EfficientNet_B2 74.286 70.714 64.487 66.095 7.735 0.681 
EfficientNet_B3 75.714 72.308 65.204 66.359 10.733 0.992 
EfficientNet_B4 72.381 68.769 64.278 64.878 17.592 1.543 
RegNetX_400MF 76.429 73.074 67.472 68.470 5.105 0.420 
RegNetX_800MF 75.000 72.684 65.218 67.185 6.603 0.809 
RegNetX_1_6GF 73.333 70.136 64.869 66.038 8.299 1.618 
RegNetX_3_2GF 75.000 70.812 64.682 65.630 14.312 3.198 
RegNetX_8GF 78.333 77.654 69.547 71.256 37.698 8.021 
RegNetX_16GF 75.714 75.210 64.439 66.525 52.279 15.990 
RegNetY_400MF 74.762 72.230 65.505 67.071 3.914 0.410 
RegNetY_800MF 74.286 70.751 64.156 65.073 5.666 0.845 
RegNetY_1_6GF 73.571 70.069 63.407 64.851 10.335 1.629 
RegNetY_3_2GF 76.191 70.237 65.338 65.848 17.960 3.200 
RegNetY_8GF 74.762 70.841 64.068 64.830 37.413 8.515 
RegNetY_16GF 75.000 73.363 66.207 68.029 80.638 15.960 


Note: F1 refers to the harmonic mean of the Precision and Recall values. Params, the number of parameters; FLOPs, the number of 
floating-point operations. M, megabyte, which refers to the storage space occupied by model parameters (1 M=1024 kilobytes); G, 
model operation speed (1 G=10%/s). VGG, Visual Geometry Group Network; ResNet, Residual Network; DenseNet, Dense 
Convolutional Network; SqueezeNet, Squeeze Network; MobileNet, Mobile Network; ShuffleNet, Shuffle Network; EfficientNet, 
Efficient Network; RegNet, Residual Network. 


202212.00077v1 


chinaXiv 


ChinaXiv@ ERAT! 


JOURNAL OF ARID LAND 
80 F (a) 
Oraner sor 
WrewNeK AOMP | Roger 5 2GF 

Eficienta B1 @EtTicieatNe B3 Gidi ison 

75 F @ Re oa Penor @ rrav sor @ raray iar 
RE Sat aa 
nile Nay | SGP Rees 


ResNecI8 @ ResNewst 


EfficientNet_B¢ 
@ Mobienerv2 
S 
zX 
— 
A @eserew01 
K) @ denseneti69 
S 65 Oomen 
3 @Mobiteneev3_Large 
S 
v 
x 
@ovidenev3. Smali 
60 @ Shumerav2 X10 
SqueezeNet!_0 
SqueezeNet!_1 
ShutleNesV2_X0_$ 


@vcon 


@vo0 


@ voc 
@ voos 


0 50 100 105 
The number of paramters (M) 
80° (b) 
@ Regnex sr 
© Pawn 400M oa 
ReiemNct ma @rewery_3.208 @ renax ieor 
aan r @RepNewx 3 20F @ roanav sGr @ Reger _16GF 
T a 16GF Gisa 
aaNet N 
@vovitereev2 
70} 
© con 
> @ vons 
S e 
< P a Ovocie voon, 
§ DenseNet169 
S 65 @ seoereci2 
3 @Moviieneev3_Large 
R 
V 
x 
(J MobileNetV3_ Small 
1@ stumenewv2_X1_0 
60F 
55 
Osona o 
SeoceNetl_| 
MeNe V2 X0_$ 
50 L L 1 L L L 1 L L J 
0 2 4 6 8 10 12 14 16 18 20 


The number of floating-point operations (G) 


Fig. 4 Relationships of the Accuracy with the number of parameters (a) and the number of floating-point 
operations (b) for 37 different models used in the image recognition of desert plant species. M, megabyte, which 
refers to the storage space occupied by model parameters (1 M=1024 kilobytes); G, model operation speed (1 
G=10?/s). VGG, Visual Geometry Group Network; ResNet, Residual Network; DenseNet, Dense Convolutional 


Network; SqueezeNet, 


Squeeze Network; MobileNet, Mobile Network; ShuffleNet, 


EfficientNet, Efficient Network; RegNet, Residual Network. 


Shuffle Network; 
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3.2 Optimal model for the image recognition of desert plant species 


According to the comparative analysis of the above results and considering factors such as 
hardware equipment and inference time, MobileNetV2 exhibited the best comprehensive 
performance for the image recognition of desert plant species and had better application prospects 
in practical work. The classification results of MobileNetV2 and RegNetX_8GF are shown in 
Table 3, and the confusion matrix is shown in Figure 5. 


Table 3 Classification results of MobileNetV2 and RegNetX_8G in the image recognition of desert plant 
species 


MobileNetV2 RegNetX_8GF 

Species name Precision Recall Fl Precision Recall F1 

(%) (%) (%) (%) (%) (%) 
Ephedra intermedia (00001) 68.182 68.182 68.182 76.923 90.909 83.333 
Iljinia regelii (00002) 60.000 54.546 57.143 66.667 54.546 60.000 
Corydalis kashgarica (00003) 75.000 46.154 57.143 100.000 38.462 55.556 
Zygophyllum kaschgaricum (00004) 43.750 43.750 43.750 60.000 93.750 73.171 
Ammopiptanthus nanus (00005) 90.909 83.333 86.957 73.333 91.667 81.482 
Oxytropis bogdoschanica (00006) 58.333 70.000 63.636 50.000 50.000 50.000 
Caragana polourensis (00007) 52.632 71.429 60.606 76.923 71.429 74.074 
Glycyrrhiza inflata (00008) 80.000 61.539 69.565 80.000 61.539 69.565 
Ammodendron bifolium (00009) 0.000 0.000 0.000 0.000 0.000 0.000 
Eremosparton songoricum (00010) 100.000 33.333 50.000 100.000 33.333 50.000 
Lagochilus lanatonodus (00011) 75.000 50.000 60.000 83.333 83.333 83.333 
Frankenia pulverulenta (00012) 0.000 0.000 0.000 100.000 50.000 66.667 
Salsola junatovii (00013) 51.220 75.000 60.870 78.261 64.286 70.588 
Gymnocarpos przewalskii (00014) 72.727 76.191 74.419 78.261 85.714 81.818 
Helianthemum songaricum (000015) 95.833 92.000 93.878 100.000 88.000 93.617 
Haloxyon persicum (00016) 25.000 10.000 14.286 50.000 50.000 50.000 
Caryopteris mongholica (00017) 82.857 82.857 82.857 85.714 85.714 85.714 
Populus pruinosa (00018) 50.000 30.000 37.500 100.000 50.000 66.667 
Tamarix taklamakanensis (00019) 73.077 73.077 73.077 73.077 73.077 73.077 
Cistanche deserticola (00020) 95.238 90.909 93.023 95.455 95.455 95.455 
Calligonum ebinuricum (00021) 75.000 57.143 64.865 80.952 80.952 80.952 
Prunus tenella (00022) 75.000 100.000 85.714 100.000 100.000 100.000 
Haloxylon ammodendron (00023) 75.000 75.000 75.000 70.175 83.333 76.191 
Populus euphratica (00024) 73.333 93.617 82.243 84.615 93.617 88.889 


Due to the small amount of data available for Ammodendron bifolium, once the images were 
divided into the training, validation, and test sets, the results will be quite different and have no 
analytical value. From a Precision perspective for the remaining 23 kinds of plant species, for the 
model MobileNetV2, the Precision of all the plant species, except for Oxytropis bogdoschanica 
and Haloxylon persicum, was higher than 60.000%. Therefore, MobileNetV2 was able to identify 
various types of plant species well and had a high Accuracy. For incorrect classifications, it can 
be seen from the confusion matrix that one Caragana polourensis image, one Helianthemum 
songaricum image, one Tamarix taklamakanensis image, and two Corydalis kashgarica images 
were recognized as Oxytropis bogdoschanica images. Additionally, one Salsola junatovii image, 
one Populus pruinose image, one Tamarix taklamakanensis image, one Calligonum ebinuricum 
image, and one Haloxylon ammodendron image were recognized as Haloxylon persicum images. 
The Recall of Eremosparton songoricum was the lowest, at only 33.000%. The confusion matrix 
shows that there were two Eremosparton songoricum images were recognized as Haloxylon 
ammodendron in the test set. The Recall of Corydalis kashgarica was the next lowest, at 
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38.500%. Referring to the confusion matrix, it can be seen that in the test set, one image of 
Corydalis kaschgarica was respectively recognized as Ammopiptanthus nanus, Lagochilus 
lanatonodus, and Haloxylon ammodendron, two images were recognized as Oxytropis 
bogdoschanica, and three images were predicted as Caryopteris mongholica. 

Upon inspecting the original images of the wrongly classified plant species (Fig. 6), it can be 
found that the shape characteristics of desert plants were all presented or the leaves were highly 
degraded, which were scaly or cylindrical, due to long-term adaptation to the harsh environment; 
or plant shape is approximate to a round spherical. The plant images of Eremosparton songoricum 
and Haloxylon persicum in spring and summer are very similar in branch shape, branch color, and 
branching pattern. For example, the images taken from the vertical view of Corydalis kashgarica, 
Ammopiptanthus nanus, Lagochilus lanatonodus, Haloxylon ammodendron, Oxytropis 
bogdoschanica, and Caryopteris mongholica are nearly spherical. Desert plants have different 
types of adhesives for stem smoothness or leaf distribution, and the leaves are scaly or cylindrical, 
leading to significantly different taxonomic characteristics. In the process of computer vision 
recognition, the fine-grained recognition of these fine attributes is not clear, resulting in low 
similarity of external morphological features, low image recognition sensitivity, and high false 
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Fig. 5 Confusion matrix of MobileNetV2 (a) and RegNetX_8GF (b) in the image recognition of desert plant 
species. The plant species corresponding to the labels are consistent with those in Figure 2. 


Eremosparton songoricum Haloxyon persicum 
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(00003) (00005) (00011) (00023) (00006) (00017) 


Fig. 6 Images of incorrectly classified samples with high similarity of external morphological features 
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positive rate of higher plants. This shows that the performance of MobileNetV2 still needs to be 
strengthened in recognizing similar but different species of plants, and the image dataset needs to 
be improved. From the values of F1 it can be seen that the performances of Eremosparton 
songoricum, Corydalis kashgarica, Oxytropis bogdoschanica, and Haloxylon persicum are not 
good enough, which are affected by low Precision and Recall values. Considering all the factors, 
plants with flowers and fruits or crown shapes and color, such as Populus euphratica, Cistanche 
deserticola, Calligonum ebinuricum, Prunus tenella, Lagochilus lanatonodus, and Caryopteris 
mongholica, obviously differ from the others without these characteristics in the images and have 
better recognition performances (all the indicators exceeding 80.000%). In conclusion, without 
the intervention of experts, the lightweight network MobileNetV2 achieves the automatic 
classification of plant images accurately and quickly. 


3.3 Verifying the research 


To verify the validity of the optimal model discovered in this study, we selected the Tianchi 
Bogda Peak Nature Reserve and Ebinur Lake Wetland National Nature Reserve, where there are 
many desert plants, for empirical verifying. The Ebinur Lake Wetland National Nature Reserve 
gathers more than 90.00% of the plant species in the deserts of the Junggar Basin, and some 
endangered and endemic species are also distributed there. It is one of the regions with the most 
abundant desert plant populations in inland river basins in China, and the plant species here 
account for about 64.00% of the country's total desert plant species (Yang et al., 2009). The 
Tianchi Bogda Peak Nature Reserve covers the area where the main peak of the eastern Tianshan 
Mountains, Bogda Peak, is located. Within a horizontal distance of 80 km from south to north, it 
has a complete vertical band spectrum of mountains. With about 700 plant species, the area is the 
most typical representative of vertical mountains in the world's temperate arid regions and is 
included in the UNESCO Network of Man and Biosphere Programme (Su and Niu, 2016). 
Amongst the 24 desert plant species selected in this study, six of them are present in the Ebinur 
Lake Wetland National Nature Reserve, and nine of them are distributed in the Tianchi Bogda 
Peak Nature Reserve. 

The empirical results of MobileNetV2 and RegNetX_8GF in the image recognition of desert 
plant species are shown in Table 4. In can be seen that in the image recognition of desert plant 
species in the Tianchi Bogda Peak Nature Reserve, the Accuracy, Precision, and Recall of these 
models reached 83.000% or more. The Accuracy of MobileNetV2 is 83.871%, which is nearly 
5.00% higher than that of RegNetX_8GF in the image recognition of 24 desert plant species 
(Accuracy of 78.33%). MobileNetV2 is of high accuracy in the image recognition of desert plant 
species, and it has good application prospect in the practical work. In the image recognition of 
desert plant species in the Ebinur Lake Wetland National Nature Reserve, each evaluation 
indicator also reached more than 60.000% for the two models. It can be seen that both 
MobileNetV2 and RegNetX_8GF have high accuracy values. The performance of these two 
models was compared in the two nature reserves, with respect to the image recognition of 24 
desert plant species. In terms of evaluation indicators, the empirical identification of desert plant 
species in the Ebinur Lake Wetland National Nature Reserve and Tianchi Bogda Peak Nature 
Reserve was poor, and the values of the evaluation indicators in the Ebinur Lake Wetland 
National Nature Reserve were lower than those in the Tianchi Bogda Peak Nature Reserve. There 
may be a variety of reasons for this. The images of the nine kinds of desert plant species in the 
Tianchi Bogda Peak Nature Reserve were all obtained from the "Color Atlas of Wild Vascular 
Bundle Plants in Bogda Biosphere" (Su and Niu, 2016), and the pictures were also processed by 
screening and clearing. The images of the six kinds of desert plant species in the Ebinur Lake 
Wetland National Nature Reserve were taken from the field without clearing and other processing. 
These resulted in the differences of the above comparative findings. The comparison also 
illustrates the importance of the quality and quantity of image recognition datasets to network 
models, and also implies that no one network structure can guarantee its superiority over other 
network structures or datasets. For specific datasets, we need to conduct experiments and select 
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the network structure with the best performance, based on the experimental results. This is also 
the practical significance of this research. 


Table 4 Performances of empirical application of MobileNetV2 and RegNetX_8GF in the image recognition of 
desert plant species in the Tianchi Bogda Peak Nature Reserve and Ebinur Lake Wetland National Nature Reserve 


Tianchi Bogda Peak Nature Reserve 


Model 
Accuracy (%) Precision (%) Recall (%) F1 (%) 
MobileNetV2 83.871 90.516 83.987 86.715 
RegNetX_8GF 86.559 95.299 88.644 91.508 

Ebinur Lake Wetland National Nature Reserve 

Model 
Accuracy (%) Precision (%) Recall (%) F1 (%) 
MobileNetV2 64.865 71.466 69.065 68.123 
RegNetX_8GF 60.360 77.317 63.309 67.368 


4 Discussion 


The Accuracy of the optimal MobileNetV2 in the image recognition of desert plant species 
screened in this study did not reach more than 90.000% of the image recognition Accuracy of 
plant species in a single background, indicating that the finding of this study is still a long way 
from practical application (Zhang and Huai, 2016). 

Firstly, from the perspective of constructing image dataset, increasing the amount of image data 
and enhancing the image quality are helpful to improve the image recognition accuracy (Li, 
2022). This study is based on 2331 plant images for model training and testing. At the same time, 
due to the large difference in pixel size between plant images collected from the field (using 
mobile phones and cameras) and PPBC, the performance of model classifier is affected to some 
extent. In the future, through transfer learning, data expansion and image cleaning technology can 
solve the problems of insufficient data and different image standards to a certain extent. As for the 
problem of complex background in the images, it can be seen from the image recognition results 
in the Tianchi Bogda Peak Nature Reserve (Table 4) that the Accuracy can reach more than 
80.000% if the object is focused and the features are prominent in an image. Barbedo (2016) also 
demonstrated that removing the background of an image can improve the image recognition 
Accuracy by 3.000%. However, background removal requires a lot of work and professionals to 
complete, which is often difficult to achieve in the application. In theory, the more differential 
features extracted from an image, the higher the Accuracy of image recognition (Gai et al., 2021). 
It is obvious that leaves, flowers, and fruits of plants have the advantages of multiple shape 
features, high recognition, and high discrimination. Future research can fully use the multi-feature 
fusion method of panoramic images of plants and images of organs (such as flowers, fruits, and 
leaves) to further improve the accuracy and sensitivity of the models. 

Secondly, from the perspective of data processing and analysis, fine-grained or new network 
structures can be considered to learn and obtain more expressive depth features. It can be seen 
from the misclassified plant images (Fig. 6) that due to the similar morphological characteristics 
of desert plant species, there is still a problem of high misjudgment rate. On the one hand, the 
complexity of the collection environment will cause the uncertainty of expert labeling. However, 
Bekker and Goldberger (2016) verified that the deep convolutional network can maintain a high 
reliability when the number of mislabeled samples is not very high. On the other hand, the 
occurrence of some image recognition errors is probably due to the neglect or inability to 
distinguish some subtle attribute features in the process of model learning. For such extreme cases 
that cannot be effectively distinguished visually, prior knowledge should be combined to make 
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decisions. How to introduce the existing plant family and genus classification labels as prior 
information to improve the generalization ability of neural network and make it more suitable for 
the image recognition of desert plant species in Xinjiang will become one of the next research 
contents (Cao et al., 2018). 

Thirdly, in terms of learning algorithm, the current parameter adjustment of convolutional 
neural network basically relies on experience and practical operation, which requires constant 
training, tuning parameter, and repeated trial and error, consuming a lot of time and energy (Tang, 
2020). Auto Machine Learning (AutoML) is the rise of popular research field in recent years (Liu 
and Luo, 2019). It automatically builds a network structure, which can guarantee the same 
accuracy as classic artificial selection network and its application to the species identification of 
natural protected area, and is expected to overcome the artificial selection on subjective fault, 
select a better network structure objectively, and improve the image recognition accuracy. 

Finally, from the perspective of the scope of application of the models, rare desert plant species 
also include Betula holophila, Reaumuria kaschgarica, etc. (Yin, 1991). However, due to the 
difficulty of collection, only some desert plant species in Xinjiang were selected as the research 
objects in this study. In addition, this study is based on static image data processing and analysis. 
At present, a large number of video surveillance systems have been arranged in the nature 
reserves in Xinjiang. Therefore, it is also an important direction to strengthen the research on 
video image data recognition in the future. 


5 Conclusions 


Based on image processing and deep learning technology, this study adopted 37 commonly used 
non-lightweight and lightweight convolution neural network models of eight categories to 
recognize the images of 24 desert plant species typically distributed in Xinjiang. The results show 
that there are 24 models with Accuracy above 70.000% and nine models with Accuracy above 
75.000%. Among them, the performance of RegNetX_8GF is better than other network models. 
The Accuracy, Precision, Recall, and F1 values of RegNetX_8GF are 78.333%, 77.654%, 
69.547%, and 71.256%, respectively, which meet the requirements of conventional image 
recognition. To further measure the relationships of the Accuracy with the number of parameters 
and the number of floating-point operations in the models with Accuracy higher than 70.000%, 
we found that MobileNetV2 achieves the best balance among the Accuracy, the number of 
parameters, and the number of floating-point operations. The number of parameters for 
MobileNetV2 is 1/16 of RegNetX_8GF, and the number of floating-point operations is 1/24. 
Considering hardware equipment, inference time, and other factors, MobileNetV2 has the best 
performance in the image recognition of desert plant species and is more suitable in field 
investigation. In order to verify the effectiveness of this study, we empirically tested 
RegNetX_8GEF and MobileNetV2 in the image recognition of desert plant species in the Tianchi 
Bogda Peak Nature Reserve and the Ebinur Lake Wetland National Nature Reserve, and found 
that MobileNetV2 has a good application prospect in the practical work. 

Due to the limitations of image datasets, the image recognition accuracy still needs to be 
improved. In the future research work, we will further enrich the image sets of desert plant 
species in Xinjiang in multiple ways and forms, optimize the convolutional neural network model, 
improve the test accuracy, and provide solutions for the administration agencies of nature reserves 
to carry out large-scale field plant background investigation, so as to improve work efficiency and 
decision-making ability. 
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